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ABSTRACT 


An investigation was made on the problem of automating the 
telephone directory assistance system. After a review on the 
computer methods of person identification, studies made directly 
toward automating the telephone directory assistance system were 
also introduced. A system based on a special coding method was 
studied and presented as a feasible solution to the problem. 


Also, possible extension to the system was discussed. 
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Chapter I 


GENERAL INTRODUCTION 


Human handling of information has long been a problem area in 
information processing since the advent of modern digital computers. 

It is the slowest and most difficult to improve facet of a complete 

data processing system. Our point can best be illustrated by an example: 
In a cross-country airline reservation system, data are transferred 
rapidly between service centers and the processing center, search on 
file is done in the best-known fashion; yet most of the delays are 
caused either by the ticket agent or the customer. There are many 
factors that could contribute to the delays. For instance, slow keying 
(for CRT keyboard), communication problems between the customer and the 
ticket agent, indecision by the customer, or third party interruption 

of the ticket agent are among the most common reasons. 

Obviously, if the human element can be eliminated from the system, 
information processing can be improved many-fold. Although complete 
absence of human operator seems impossible for most systems, the ulti- 
mate goal might be to achieve a minimal level of human intervention. 

Telephone directory assistance is a classic case. A medium-sized 
telephone company serving a population of 500,000 has to employ hundreds 
of operators to answer calls. A bulky telephone book has to be updated 
every month for entering of new listings, deletion of old listings and 
changes of current listings. Expensive electronic equipment has to be 
purchased. Above all, enough space has to be provided for the equipment 
and operators. The process is slow and prone to error. The whole 
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operation is costly to set up and maintain. 

Many attempts have been made to mechanize the operation of tele- 
phone directory assistance without much success. Heavy human involve- 
ment is retained because their function in the system ee to be 
"either not economical to mechanize or perhaps even impossible to fully 
mechanize".* 

It is the purpose of this report to propose a Pere pucne directory 
assistance system without human operators. By dialing the telephone 
on his desk, a caller makes his enquiry and receives a pre-recorded 
human voice answer. The technique involved is called "Computer-con- 
trolled message synthesis" and it was developed at Bell Labs in 1970. 
The enquiry is recorded and coded into a search-record; this search- 
record is passed to a search program; search files are on-line; for 
example, disk-file. The result--a telephone number if the search is 
successful, otherwise a message is transferred to a simulator which 
performs the message synthesis. 

Chapter II of this report gives a survey of person identification 
techniques, since telephone directory search is a typical person 
identification problem. In this survey six representative and signi- 
ficant studies are introduced. 

The state-of-the-art in telephone directory assistance is discussed in 
Chapter III, while the local telephone company (Edmonton Telephones ) 
is used as a study base. Also, current developments at Edmonton 
Telephones are reported for comparison. 

The basis of this study is a special direct-dialed coding method 
on a telephone set (DD Code). Details and the justification of this 


method are exavined in Chapter IV. The approach taken in this study 
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is clearly different from conventional approaches as reviewed in the 
previous two chapters. Also three of the name coding methods dis- 
cussed in Chapter II (Blair's, Davidson's and SOUNDEX) were programmed 
with different sample sizes and the results are compared with DD Code 
on merits of discriminating power and degree of redundancy. 

Chapter V describes the proposed telephone directory assistance 
system with detailed file organization and maintenance procedures. 


The problem of system performance measurement is also dealt with. 
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CHAPTER II 


THE PROBLEM OF PERSON IDENTIFICATION 


A. Pioneer Work 

Some of the most important pioneer work in the field of 
identification of medical documents was done in Britain. In 1948, 
Lancelot Hogben, Muriel Johnstone and K. W. Cross? of University 
of Birmingham and Birmingham United Hospital were commissioned to 
look into the possibility of designing a system of medical docum- 
entation that could make provision for the efficient identification 
of individual patients. Their basic findings are listed below: 

1. birth names are not specific (over a sixth of the entire 
population of England and Wales falls in one of the 50 
most frequent surnames) ; 

2. an initial proposal of a six-cipher code, using patient's 
birth date (day, month, year); 

3. a ten-cipher code, the 4 additional ciphers (2 for sur- 
name, 2 for first name) being distributed to assure 
approximately equal relevance to each of the 10,000 
compartments which 4 ten-row columns accommodate. 

An initial analysis of the run of letters in 88,000 surnames 
of a Midland telephone directory was carried out and surnames were 
grouped into 100 blocks of equal frequency with a 2-figure code 
for each block. Two ciphers were used to code first names with 


the first digit distinguishing the sex of the individual. 
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Advantages of this method: 
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two individuals with same first names but different birth 
rank have a different code; 

an initial of the first name will receive a different code 
than the full first name; 

search on surname will have equal chance of a find in one 
of, 00. baLocks:; 

files can be arranged in pirtp, date sequence; 


unified patient numbering system; 


A possible search scheme would be: 


i 


Arrange records in order of birth date, then name cipher; 

(birthdate ciphers should be arranged in increasing order 

of year, month and day.) 

Search on birth date (if there are duplicates, then search 


on name ciphers). 


B. Phonetic Technique 


ules 


sounpEx?° - The Phonetic Name Coding System 


SOUNDEX is a name coding system designed to solve some 
problems in name indexing. They are: 
a. different forms of spelling of the same or similar 
surname ; 
0. ) GRrors in spelling: 
ec. misinterpretation of handwriting; 
d. translation of names of foreign origin. 
The system is based on the principle that there are 
certain key letters (consonants) in the alphabets which 


cannot be eliminated from a proper name without making it 
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into something else. If we retain these letters in a 
name compression coding system, we have also retained the 
'features' or 'characteristics' of the proper name. 

Since names are usually filed as written and found 
as spoken, there is a problem in determining the exact 
apelling of a name. SOUNDEX™ codes similar names or var- 
iations in spelling into one group. 

The merits of the SOUNDEX system can best be demons- 


trated by the following examples: 


Name Code 
MARAN M650 
MERAN M650 
MIRAN M650 
MORAN M650 
MOREN M650 
MORRAN M650 
MOURAN M650 


SOUNDEX has its shortcomings, too, particularly from 
the viewpoint of computer name coding methods. SOUNDEX 
does not utilize the advantages of speed and precision of 
a computer. 

There are two major difficulties in record linkage 
by names: 

a. names that should not be linked are linked; 


b. names that should be linked are not linked. 


*Note: see also Appendix C for rules of SOUNDEX. 
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The SOUNDEX method is especially weak in that names 
that are distinctly different are coded the same. The 
system is 'loose' in the sense that it cannot distinguish 
the finer features of names. 

The following are some examples: 

a. names to be distinguished by a vowel are coded 

the same in SOUNDEX. This is rather common with 


oriental names. For instance: 


Name Code 
WONG W520 
WING W520 
WANG W520 


b. names with insufficient consonants will be coded 
with zeros which provides little discriminating 


power. For example: 


Name Code 
HALL H400 
HILL H400 
HULL H400 
WU wooo 
WA wooo 
WEI W000 


c. most consonants are coded the same. For example: 


mee, Ys Kev, Sex, 2 all received the same code 


—- code 2. 
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d. names that should receive the same code are coded 
differently (some silent consonants). For 
example: 

Name Code 


PSHEDEZKY P322 
SHEDEZKY See 


SZCSYBALSKI S222 
SABALSKY S12 


MJELDE M2h3 
JELDEE J430 
JELDE J430 


In conclusion, the SOUNDEX system has its strong po- 
ints and also its weaknesses. It is recognized that no 
system of name indexing Ls perleens an aesels. 9) LE ris eiso 
clear that the strong points of SOUNDEX are at the same 
time its weaknesses. For an ideal system, a combination 
of several methods may prove to be more satisfactory, 
because with the aid of a fast computer system, complex 
algorithms can be implemented and utilized speedily and 


efficiently. 


2. Atomic Energy of Canada Ltd. 
A group of researchers at the Biology Branch of the Atomic 
Energy of Canada Ltd. at Chalk River, Canada, also have done 


a fair amount of work in computer method of person identific- 
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ation. They were concerned mostly with vital record 
Tinkage. 


In 1957, Newcombe, Axford and oneness 


published a 
report describing the technique for co-ordinating from 
routine vital and health records information on heredity 
influences on health and for verifying the status for 
welfare program. Their pioneer effort was mainly on how 
to obtain reliable sources of information concerning the 
fine structure of family relationships from individual 
vital records. 

To reduce the time to manipulate a considerable amount 

of name information, SOUNDEX coding method was used 
extensively. For example, full name code could consist 
of the following information (a total of 16 digits): 

a. SOUNDEX code of father's surnane; 

b. : SOUNDEX code of father's mother's maiden name ; 

c. SOUNDEX code of mother's maiden name; 

d. SOUNDEX code of mother's mother's maiden name. 
The advantages are obvious, either an operator can code- 
punch directly or a program can be set up to code-punch 
names. 

Between 1957 and 1965, the Newcombe and Kennedy team 
of Atomic Energy of Canada published a number of articles 
(references 21-32) on computer methods of vital record 
linkage. To summarize their findines 

a. they used the SOUNDEX coding method to facilitate 

the process of person identification; 


*see also Appendix }) on Il'actors Influence Choice of Identifiers 
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b. a weighted factor is used for different pieces 
of identifying information. Assuming, the dis- 
criminating power of a particular item of ident- 
ifying information depends upon its frequency of 
occurrence in the population, then a middle name 
initial of letter 'Z' has more discriminating 
power than the letter 'J'. 

c. taking into consideration the combined discrim- 
inating power of all items of identifying infor- 
mation in the document, a greater degree of 
certainty can be achieved. In practice, they 
expressed the discriminating power for various 
agreements and disagreements of the different 
items of identifying information as logarithms 
so as to make them 'addable'. Tables of such 
values were prepared and listed in their report. 

Os Letter Frequency Approach 

Charles Blair of the Department of Defence, Washington, 
D.C. published his paper on 'A program for correcting spell- 
ingverrors: in 1960 in Information and Contro1°. It is refe- 
rreas+uo as "Blair's eee in the following discussion. 

By abbreviating names but retaining the 'kernel' of the 
original names, misspelled names which retain enough similar- 
ity to the original can be retrieved. Basically, Blair 
recognized that not all letters in a word are equally import- 
ant. If the misspelled word happens to retain the important 


*Note: Blair's method was programmed with various size samples for 
comparison purpose. See details in Chapter IV. 


: : ' euere's! : rer 

hg ‘ a buf aN tex sage ats “ | 7: 

peoeig peerenrs e-Bige be ane oddgnae « is “i a 
wah. oid qettiniter “A sfieed gears ctbine, gaiy teat te } i 


a 


<ditebi "ali x i tg #, Vo ood nd tonite 
to. yousupet? agi. noms Stet NL herrso tent aah) 
gin sfb5 hit 2 mécid eo basta eat at SORSTEENSO. 
ghicsatmitoeth. stom ead ‘8? Todte *0 ee 

| MT Calne ¢ fo vet arid med tewoq 
-mitoaif bomidioo art AGsanAN tates OoHt sauitsted ma ; 
—tolgi gAarviidnsd!l to amedti Ife “lo ive onde 
to ‘edie “SteoI1R s thsmusob add at Ho Tham 

yes ,eoitoerg al- .bevekrice. sd 89 Gaines 
Hater <ot teavog ‘go itouiminoe i Sit Sicaseank 
InstetTt ih edt to, atremsotgne th Bae Saenrae 
aniiP bint és hoe thentotar aniyhivashi to ‘wated, 


fave to eeldea? . 'efdebis’ iInsas Sookie o¢ a6 02 


.toget sheds ob Sabais ben + Catamete otew p eoaey 


-HodaalidaawW esasted to Sacenonelitede ; J %0 aba eo ive 
+Eleqs mation ok argos us Trey 
wots at (1 Pfoui® Sab aonvanaet epee ete an 


eres! ie a "i tae sad a call 


rat 


: i 
- o 
-s 


a Sorin 


ant 


letters, it should receive the same abbreviation as the corr- 

ectly* spelled word. 

The abbreviation algorithm is as follows: . 

1. score each letter of the name according to its frequency 
of occurrence in English text (a table of frequency was 
provided in his paper); 

e. score each letter of the name according to its position 
(using a table which gives the logarithm of the desirab- 
LE Gy oO deleting a letter as a function of its position). 

3. total the scores for each letter; 

4, take the four letters of lowest scores from left to right; 

5. the four letters will be the 'abbreviation'. 

It was found that names with small variations frequently 
resulted in the same abbreviation. One observation is that the 
choice of four as the size for abbreviation word might not be a 
wise one. In fact, the author of this thesis found 5 letters from 
the last name will give better discriminating power in comparison 
with just 4 letters. A similar problem was encountered in the 
project when attempting to determine the optimal word size allowed 
for the last neme in order to minimize the duplicates of special 


# 
codes and et the same time maximize the discriminating power . 


1 Scoring and Matching 

“here have been many attempts to deal with the problem of 
misspelled words during transmission of information. Miller and 
Friedran published in the Journal of Information and Control in 


: : ih 
1958 on tl: subject of reconstruction of Mitt labedainmlish Text. 


#Note: see Tobles -13 and 4-14, discussion in Chapter IV, section G. 
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They claimed that the average person, given limited time to work, 
can correct passages reasonably well only if the mutilated text 
errors are less than 10%; the job is most difficult if it consists 
of random substitutions of wrong letters. Their approach is 
basically trial and error. According to the frequency of occur- 
rence of each letter, substitutions are made. Their findings are 
not of direct concern to the name identification problem because: 
1. English text was used as test data base. Word meaning 
in context can be taken into consideration, whereas 
person names do not have such built-in characteristics. 
2. Their method is not readily programmable for a digital 


computer (no clearly defined algorithm). 


E. Name Compression 


Need arose for retrieval of misspelled names in Airlines 


- (1962) tackled the problem by way 


Passenger Records. Davidson 
of name compression. He avoided using the phonetic techniques for 
two reasons: 

1. The international scope of names to be handled makes the 
phonetic equivalents of certain letters difficult to 
standardize. 

2. The rapid turnover b tangas agent personnel prohibits 
a system requiring training in phonetics. 

He developed a spelling-matching technique which in one sense or 

another recognizes the 'essence' of a name despite the variant 


forms created by usual or unusual misspellings. 


His compression scheme was in fact a well-known name coding 
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method published by IBM ~. His search technique was also conven- 
tional: first a preliminary search on coded surname; if there is 
more than one match, then a search on coded full name; and even- 
tually a display of matched records for manual check by an operator. 

However, Davidson's research provided an interesting direction 
for further investigations. He derived a routine called "Ill- 
spelled Routine" for error recovery. Essentially, it is intended 
for spelling error correction and it is called for whenever spell- 
ing errors are more significant than 'vowel errors'. There are 
three rules used: 

1. no letter appears in a code name more than twice; 

2. space characters in coded surnames are packed at the 

right hand end; 

3. repeated letters in a coded surname are not contiguous. 

Also always look for a string of letters in the same sequence 
in both the coded surname and the retrieved record. This helps to 


suppress 'noises' caused by unmatched letters. 


F. Universal Identifier (UID) 

Person identification is a classical problem in information 
processing because the natural identifier (names) is a poor one 
in terms of uniqueness and discriminating power. There are just 
too many persons in the streets with the same names. Until a 
unique and universal identifier can be given to each individual, 
handling of information pertaining to human beings will continue 
to be a very difficult and frustrating job. Person name as an 


identifier although desirable, is ineffective and inefficient. 
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Attempts have been made many times in the last decade to 
establish a Universal Identifier (UID). Several European countries 
have already adopted some form of a UID system to facilitate 
processing of huge volumes of data about their citizens. This 


includes the Scandinavian countries and Great Britain. A number 


of others, like Japan and West Germanv are implementing similar 


systems. 

The UID system of West Germany* is a twelve-digit number 
assigned to each citizen who is known officially to government 
by this twelve-digit number thereafter. To break down the 12 
digit number: 

is Sixrdieits indieatinge birthday ; 

2, one digit for sex and the century of birth; 

3. four digits to distinguish one from others born on the 

same day; 

4. one digit for control purposes. 

The Swedish UID is composed of ten digits. The first six 
digits indicate the birthdate of the individual, then a three 
digit number to distinguish persons born on the same day (odd for 
men and even for women), plus a control digit. An earlier version 
of the Swedish UID system was introduced in 1947 and the control 


digit was added in 1968. 


G. Conclusion 


Many data processing specialists consider the universal use 


* Note: TIME magazine, July 12, 1971. 


**Note: See Appendix E for details. 
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of a UID will be the ultimate solution to many current problems 

- in information processing. History shows disregard of the human 
aspect of a system is a common pitfall of most potentially great 
technological achievements. The fatal mistake was in designing a 
great system on paper without the adequate knowledge of the needs 
of prospective users. 

Trying to fit human beings into a system rather than trying 
to fit a system to the needs of human beings has been costly for 
many well intended computer application systems. Failure to 
satisfy the needs of users is the fault of the system analyst, not 
the users. Some will argue that it is too time-consuming and costly 
to go all out and try to serve the users, but the shortcoming is 
of technology, not of human beings. 

In view of the rapid advancement of computer technology, 
machines are built to work 10 times faster but relatively cheaper. 
The cost of data processing (speed and storage wise) has been 
reduced greatly over the last decade: e.g. The price of disk 
units was reduced by about one half. 

The problem of person identification is not going to be 
solved by the introduction of UID, but rather by more sophisticated 
and human oriented-system. UID is just a dream of ‘lazy' data 


processing personnel. People just do not want to be identified 


by a number, 
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CHAPTER IIT 


THE PROBLEM OF TELEPHONE DIRECTORY ASSISTANCE 


Po. sie YORUCTIOR 

Telephone directory assistance has been provided by the 
telephone companies to their customers free of charge in the past. 
However, due to increasing demand and cost, telephone companies 
find systems used in the 60's can no longer cope with the current 
Situation. Therefore, a much superior system is definitely needed. 
There are three areas which must be improved: 

1. response time; 

ou) COSt: 

Se |) DOLLSon baty 

In view of the above criteria, a computer-assisted directory 
system seems to be the most logical solution. Not only can it 
provide faster and more accurate answers to calls but printing of 
bulky directory *updates' can also be eliminated. 

There are at least four cities currently engaged in the 
study of computer-assisted directory systems: 

i. Oakiand, Californias 

on Wem Lorki, Nica Yours 

3. Copenhagen, Denmark; 

4. Edmonton, Alberta. 

There is a prototype being tested in Oakland, California. 
The system features CRT terminals to display the 'hits'. 'Hits' 


are records that are found with the same major key--last name. 
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Last names are keyed in by the operator using the CRT keyboard. 
If there is more than one 'hit' displayed on the CRT screens, the 
terminal operator then asks the caller to supply further infor- 
mation (e.g., first name, address) that may help to identify the 
desired number. When the correct record is identified, the 
operator quotes the telephone number, (one number only). No 
numbers are to.be given if the operator fails to find a single 


unique record. 


B. Edmonton Telephones 


Currently at Edmonton Telephones there are 48 stations in 
the Directory Assistance Department. There are 130 operators 
employed, working three full shifts, with an annual budget over 
$780,000. 

The Directory Assistance Department is also responsible for 
giving such service to Northern Alberta as well. The most signi- 
ficant aspect of this service is that it's free. It was estimated 
that once a charging scheme was adopted, demand on such a service 
could drop to only 40% of the current rate. An extensive survey 
has been done by the Directory Assistance Department of Edmonton 
Telephones. There are several interesting findings: 

1. currently, average response takes about 20 seconds; 

2. it seems just four letters (three from the last name, one 

from the first initial) are sufficient to identify a 
person; 

3. if there is more than one *hit' during the search, more 


information is asked for (e.g., address); 
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4. cus tomer misspelling causes difficulties (as also do 
short names instead of full); 

>» current equipment is not capable of handling the demand 
projected for 197h; 

6. therefore Computer-assisted, directory search is deemed 
inevitable; 

7. this service is used for business numbers 70% of the 
time; 

8. directory assistance department policy is only one 
number to be given even when there are two possible 
numbers ; 

9. there is a 'most-frequently-called' number list (it is 
learned by word-of-month, and updated irregularly); 

10. there are approximately 200,000 records on file at the 
present 3 
The Directory Assistance Department at Edmonton Telephones 
is actively engaged in the study of a computerized directory 
assistance system. The solution they are seeking would have the 
following features: 

1. a long-term solution; 

2. one that can cut response time by half(current manual 
system takes about ten seconds per call on the average); 

3. possible reduction of total operating cost. 

Edmonton Telephones is considering adopting Oakland's system 

with some modifications. They are still in the planning stage and 


expect completion in 1974. 


Basically, the system visualized by Edmonton Telephones is as 
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HARDWARE CONFIGURATION 
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ey 
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shown in Diagram 2-1. 


The Master file is stored on a dedicated frum; the CRT termi- 


nals communicate to the UNIVAC 1106 main frame, and there is one 


master terminal for control purposes. 


System characteristics: 


a 


a 


3. 


iQ. 
TL 


aber 


LSI 


1h. 


human operators still needed; 

"hits' are displayed on CRT screen; 

totally redesigned functional keyboards; 

operator responds (human voice); 

can also handle conventional intercept Higa 

two separate files are maintained - business numbers file, 
and residence numbers file; 

to support approximately 40-50 CRT terminals; 

printed statistics to be produced every 30 minutes for 
control purpose (eurr. numper “or calls! etc. )s 

three types of transactions - regular changes (correction) 
through a monitor terminal (master terminal), inter- 
terminal communication (to help handle difficult cases), 
update of the frequently-called numbers list; 

response time at around ten seconds; 

save on printing of telephone directory updates; 

a maximum of 2 hits are accommodated, on 3 CRT screen 
pages (14 lines per screen page); 

language used is FORTRAN V; 


master file stored on a dedicated drum, on-line 24 hours. 


* intercepts calls for numbers dialed which are not in service. 
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File shall be in alphabetical sequence with 3 fields in each 
record. Record format is basically the same as a printed telephone 
directory. The three fields are: 

1. name: three letters from last name plus first name initial; 

2. address: may adopt Assessor Department's Roll Number file 

(City of Edmonton) address coding method - the 
4-3-2-1-5 method; 
home number: 4 3 2 1 5 
@.8. | 10510 - Jasper Ave., Apt. 20 
coded as 10510/JASP/AVE/ /20 

12 Sir Winston Churchill Squere 
coded as 12/SIR /WIN/CH/S 

3. telephone number: seven digits number. 

At the time this thesis is prepared Edmonton Teleph: ne Direc- 
tory Assistance System is still at its planning stage. Therefore, 


there is no:further information available. 


Ce Bell Telephone System 


Bell Telephones is perhaps the most active in the research into 
a better directory system to replace a manual system. Their past 
attempts like ‘microcards'’and 'microsticks' were not widely used 
because the records were expensive to update and the saving in time 
compared with printed directory was negligible. | 

In 1968, an Automatic Intercept Service invented MDW Wels 
Fey pae ela: was announced by Bell Telephone Labs. The system is 
designed to answer calls to non-working numbers. Pre-recorded 
phrases ond Gigits are pul cogether to tell callicrs what numbers 


has been Gialed, report status on the telephone and give a number 
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where the party can be reached. Similar systems were installed 
in 25 different cities. 

Eee Bs oe also of Bell Telephone Labs, published a paper 
in 1970 on computer-controlled message synthesis which confirms 
that a computer assembled message is feasible. Winkleman's system 

worked Quite well with intercept calls because a search of a file 
for a phone number always produces a unique record. The signifi- 
cant result is output with simulated voice. Trupp's research 
seemed to be the answer. 

The problem with any telephone directory assistance service 
is that a search with names (most readily available information 
from a caller) generally does not produce unique results. To 
request the phone number of Mr. John Smith would mean a possible 
choice out of too many. Therefore multiple keys are needed, and 
a multiple search has to be performed to uniquely identify a rec- 
ord. The same task performed by an operator takes time and is 
error-prone. 

The research results of both Winkleman and Trupp are used as 
building blocks in the proposed Automatic Directory Assistance 
System in sthissmeper®. .lhe.issueshbere is that.if a computer can 
control and synthesize messages (no human operators), is it also 
possible for the computer to accept calls directly from the callers 
If this is possible, then a truly automatic system can be built. 
The theme of this study is to investigate the feasibility of 
directly dialed messages (names, address etc.) to be accepted and 


'understood' by the computer. Details can be found in Chapters IV 


and. Vs 
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DY Rothrock's Proposal 


Hoe. Rothrock” published his doctoral thesis on Computer- 
assisted Directory Search in 1968. In his study, a model of man- 
machine interaction was constructed to illustrate a directory 
assistance system in which the optimal operator keying strategy 
was the prime concern. Since the employment of human operation 
is a major part of this system, all the disadvantages of such a. 
system discussed above still Seip. However, his detailed analysis 
of the telephone directory and the distribution of the descriptors 
(items of information to identify an individual listing), and the 
pattern of customer requests are of vital importance to this study. 
In fact, three significant points were of particular interest to 
our study: 

1. in North America, any city of medium to large population 

would have similar distribution of family names; 

2. on searching strategy, the use of any more than five 
letters from the last name does not increase the dis- 
criminating power significantly; 

3. on customer request pattern 
a. over 70% are for business listings; 

b. of all descriptors from a directory, only four of them 
are likely to be given by a customer (listed name, 
next name or business type, house number, street num- 
ber). 

In Rothrock's study, file organization, maintenance and file 
update were not of prime concern. Rather, various keying strategies 


as well as patterns were examined and compared. Two optimal keying 
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strategies were devised: 

i for resivenulal .isuliee : 

a. 3LN + 3SN (preferred) which means the combination of 
the first 35 letters from the last name “and the first 
3 letters from the street name. 

b. 4LN + FI (if SN is not furnished) which means the 
combination of the first 4 letters of the last name 
and che first Initial. 

fa, Ot Up ihess stings. 

Qe 3FN + SN which means the first 3 letters from the 
finding name (first name of the company) plus the 
first letter of the street name. 

Also, he proposed a search strategy, based upon the keying 
strategies he established. The concept of dividing power was 
defined and used as a measurement for descriptor-discriminating 
power. 

The retrieval of listings is controlled by the computer 
im Suchwa.tashion that as. the operator enters the codes; they are 
examined by the computer to determin the 'adequacy' for achieving 
a iesirable' number of listings. This is referred to as 
'AUTOSTART' in his thesis. The procedure assumed for his model 
is that the operatorwill continue to key descriptors according to 
general keying rules that either: 

1. AUTOSTART occurs (15 or fewer listings will be returned 

and then displayed on the CRT); 

2. no further descriptors available, START key is pressed. 


Operator can strike the START key erroneously after an inad- 
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equate combination of descriptors has been keyed. To provide a 


‘ kind of 'screening' function, 'AUTOSTQP' can be implemented for 


the purpose of minimizing time-consuming ineffectual searches. It 


also detects inadequate descriptor combinations and returns appr- 


opriate messages to the operator to indicate the problen. 


Implementation criteria for AUTOSTART and AUTOSTOP were discussed 


briefly in his.report.... For AUTOSTART: 


1. establish a threshold for minimum number of listings; 

Cet Leo ae index sequential file to reduce search effort; 

3. must accommodate changes and updates on files; 

4. must be responsive to external conditions (e.g., traffic 
measurement can be taken into consideration during 
listings searches) ; 

5. time spent on AUTOSTART decision must greatly reduce 

retrieval time; 

6. must be economical to implement; 

7. must not alienate human operators. 

EK. Conclusion 


A survey of present studies in Directory Assistance Systems 


revealed a key common feature. That is, human operators are still 


actively engaged in functioning with the hardware and software 


system as an integral unit. The reasons were: 


als 


ae 


nobody ever considered a direct dialing method; 
human intuition may help sdlve decision problems; 


natural human voice output may be more acceptable to the 


public. 
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Whether these reasons are still valid today 1S Gepatable. In 
view of the fast pace of society and demand on the service of a 
Directory Assistance System, a breakthrough in design ought to 
anticipate a much improved system. | 

It is the uous of this report to outline a Directory 
Assistance system which attempts to solve the problem without the 
intervention of a human operator and also to give improved 
efficiency and reliability. Input is accepted through dialing on 


a telephone and a human voice is simulated as output. 
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CHAPTER TV 


DIRECT CODING OF ALPHANUMERIC INFORMATION VIA A TELEPHONE SET 


The theme of this study is to design a telephone directory 
assistance system without human operators. There are basically three 
phases of the system: 

A. All enquiries are dialed in on a standard telephone set (see 
Diagrams 4-1 and 4-2 of touch-tone and standard rotary anette ae 
using the special characters * and # as message delimiters(note 
the proposed changes on the Diagrams to include the characters 
Q, Z and blank). In this way, names utilizing a 27{-character 
alphabet are translated into codes utilizing 9 digits. The 
translation algorithm is in fact the standard telephone dial 
(plus the proposed changes), hence the name chosen for the 
system - Direct-Dialed Code or DD Code. The coded message 
is then passed to a search program. 

Be The search program will do a generic search on an index ed 
sequential file (details in ChapterV). The result of the 
search - a phone number, or a message if no match - shall be 
passed on to a nessa say atne deer 

Cre | A simulated human voice generated bv the message synthesizer 


will be heard by the caller over the telephone telling him 


*Note: The proposed changes to the telephone dials shown in Diagram 
3-1 and 3-2 (both touch-tone and rotary types), are to inc- 
lude the character Q,Z and blank which are absent on current 
dials. It is suggested key 1 should be used for these. 
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A touch button phone keyboard 


(with proposed changes) 
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DIAGRAM 4 - 1 
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A Conventional Rotary Telephone Dial 


( with proposed changes) 
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whether a phone number has been found with the given informa- 


tion, and the number if found. 


This research is based on the fact that the alphabetic surname and 
ace DD Code representation have a near one-to-one correspondence. No 


rigorous mathematical proof can be given; however, empirically, it can 
be demonstrated that the assumption is very close to truth (shown in 
Tables 4-1 to 4-12 and discussion in section C of this chapter). 

Tables 4-1 to 4-12 are constructed to show the amount of redundancy 
introduced by using the DD Code rather than the alphabetic names. In 
each sample (size N), comparisons weremade of eerehers numbers of 
characters (first column - 'LAST NAME') to determine the number of 
codes that were duplicated (second column - 'SAME CODE') and the number 
of duplicated names (third column - 'SAME ALPH'). These, substracted 
from the sample size, gave respectively the number of UNIQUE CODES ( 
sixth column) and UNIQUE NAMES (seventh colum). The difference 
between the number of duplicated codes and the number of duplicated 
names (fourth colum - 'DIFFERENCE SC-SA') is then expressed as a per- 
centage of N, the sample size (fifth column - 'REDUNDANCY IN PERCENT' ) 
to give a measure of the increase in redundancy owing to the translation 
into the DD Code. The numbers of unique codes and unique names are 
also given as a percentage of sample size in the eighth and ninth columns 
respectively. 

Thus, in Table 4-1, for a sample size of 2000, comparing on 10 
characters (first row) gives an increase in redundancy of 0.95%, as 
also does a comparison on 9 characters. This redundancy increases 


rapidly to 4.25% for a comparison on 5 characters (bottom row). 
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A. DD Code coding method ~ (Direct-Dialed Code) 


For any coding method, there are two criteria to be considered 


1. same code generated for same names. 


2. different codes generated for different names. 


Mathematically, it is referred to as one-to-one correspondence. 


The coding method proposed here --- Direct Dialed Code, utilizing 
the standard telephone dial with minor changes --- will generate 
the same code for the same edheal sometimes, different names 
may receive the same code (see Appendix F for examples). In fact, 
this is a common characteristic with all existing name compression 
and name coding methods. However, finding a coding method which 
has a one-to-one mapping relation between the code and the datum 
might not be all that important if it can be controlled. That is, 
the focal point of investigation should be on how to contro] these 
undesirable'duplications' or'redundancy' to a level that is 
feasible and efficient to work with, hence allowing practical app- 
lications of the coding method. 

After a review of some of the representative name coding and 
name-compression methods currently being employed or seriously 
studied, an important fact comes to light - that no code compress- 
ion technique as yet can generate unique codes. DD Code has the 
same disadvantage, but since these 'duplications' only occurred on 
a small percentage basis, as shown in Tables h-1sto 4=123 it is 
still feasible and also very practical to use such a coding scheme 


for person identification purposes. 


B. Advantages of a System Using DD Codes 


Files could be organized intotwo main files (an R file for 
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residential and an B file for business listings), which would fur- 


ther reduce search time and errors. For a business file, the most 


powerful single item of information is the business type, because 


hardly any company will use another's name in the same line of business, 


However, since there is, at present, no workable business class- 


ification scheme available, only the name and address details are 


considered here. 


The advantages of such a system are obvious: 


10. 


ll. 


12. 


minimal human intervention (no operators) ; 

fewer errors (minimal manual handling of information); 
fo Mana eens on the system while the caller is thinking 
or keying information; 

reduces social problems (e.g. undesirable calls directed 
to the operators) ; 

faster response (search time greatly reduced); 

easy update on files (computer files); 

no handling of bulky telephone books; 

eliminate possible human conflicts (impatient callers, or 
operator in a bad mood); 

space saving (compact machinery vs. operator stations 
currently used) ; 

automatic charge accounting(charging for this service is 
inevitable in the future); 

possible implementation of a cross-country directory 
assistance system network. 

easy implementation, as a telephone set is available in 


most North American homes. 
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Gy Uniqueness of DD Codes 


To justify this claim, studies were carried out on several 
different files to examine the uniqueness of DD Code generation 
(or on the other hand examine the degree of duplications). Ideally, 
in order to test the validity and efficient application of DD Code 
technique, a telephone directory ofa medium to large population 
shculd be used as data base. However, since there is none readily 
available for machine input and the task of creating one is too 
costly and time-consuming, random samples of name files were used 
instead. The findings in each case are discussed below: 
Le Ramonton Telephone Directory 
A survey on the names listed in the Edmonton teleph- 
one directory revealed a very interesting fact. The most 
common family name (Smith) has nearly 1000 listings (933) 
both residential and business. With DD Code, no other 
family names can generate the same code as Smith. A 
test program was written for this purpose. The 10 most 
popular English family names were tested and the result 
is convincing. Only one of them (Brown) can generate a 
code that has ‘duplicates’. That is, the name Crown and 
the name Brown shall be coded the same using DD coding 
method. However, in view of the infrequenty of occurrence 
of the name Crown (there is only one residential listing 
of Crown) this 'duplication' has very little effect on the 
efficiency of search on files coded by DD method. That 
means even though the one-to-one correspondence between 


names and their DD Codes breaks down in certain cases, the 
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effect on search efficiency of the system is negligible. Further- 
more, if the caller can furnish more Sarernatien other than just 
the family name (first name, address) to increase aie discriminat— 
ing power, this problem can be overcome easily. 

Patient File 

A file of 2,000 patient names was used as a data pace to test the 
discriminating power of DD coding method. Since the names were 
extracted from a lecal hopsital, and the only reason these 

names were on this file was because they were sick once and were 
treated at that hospital, it is random enough as far as name 
variations go. Therefore, we can eseune it represents a good 
cross-section of the different family names of the city and a 
Bead random simple from the telephone directory. The result 


shows with 2,000 names data base there are only 19 'duplicates', 


if only last names were used, which is about 0.95% from the total »4S 


shown in Table-l. Even if we allow seven letters of the last 
name only, there are 21 duplicates and this revresents a O56 
of 2,000 names. 

Checking for 'duplicates' using both last name and first 
name in DD code is undesirable because first name is not avail- 


able for most of the listings. 


Student Application File 
In the study on student application file, there were a total of 


664 applications. But because some students filed more then onc. 


application to different departments some "housekeeping' has to 
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be done on the file. That is, duplicate applications are to be 
deleted Peon the file. Birthdays of applicants with same last 
names and first names are checked. If birthdays are the same, 
it is highly unlikely they are two different’ persons. New Count 
is about 6440. That is a significant step in this study because 
the inclusion of these duplicate applications will certainly 
increase the. count on 'duplicates', thus presenting a false 
picture of the discriminating power of DD Codes.. 

Again, we can use similar arguments. Since student applicat- 
ions consist of a good cross-section of names of the city (even 
for the province), we can assume it is a good random sample from 
the telephone directory. Using only last name DD codes, there are 
106 ‘duplicates' or about 1.6% of probability of getting duplicates. 
The surprising fact is that, using both last name and first name 
DD Code, there are no 'duplicates'. Even if we reduce the pune 
of letters of the last name DD code to seven letters code there 
are only 115 'duplicates' or a probability of 1.73%. The most 
significant result was foundwhen using first initial DD code with 
last name DD code. There are no 'duplicates' even if using only 
6 letters of last name and first initial in DD codes(Tables 4-3, 
and 4-4). 

To summarize the above discussions, the problem of ‘duplicates’ 

caused by the use of DD coding method is really insignificant. 
Effectively, what will happen is that under DD coding a telephone 
directory will consist of fewer family groups but more 'family 
members'. In the case of the student application file, with alph- 


abetic names, 71.4% of the total is of different family names; wh- 
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ereas after DD coding only 68.82% of the total is of different na- 
mes. This shows a shift of only 2.65%. 

Another interesting fact is that by comparing the patient 
file and the student file, it is found that with about 3.3 times 
increase of file size, the increase of duplicates is about 1.6 
times. This suggested that although 2000 is not sufficient to 
cover most family name variations, the increase of 'duplicates' 
is not in linear proportion to sample size. Different student 
application files of 6619 and 5856 entries were also tested and the 
results was compared with those collected from 6644 entries; the 
results are very close, which suggests that a sample size over 
5000 would give stable and unbiased results. This stability asp- 
ect of the redundancy shall be discussed. in more detail in 
section F of this Chapter. 

4. Student Registration File 

In order to obtain a large sample for listing of the redunda-~ 
ney of DD Code, student registration files for four years were 
merged together to form a single student registration file. This 
size is 10038 records. Data retained are last name, first name, 
and corresponding DD code. Further, this file was merged with all 
the student application mentioned in Section C.3 above to form an 
even larger file with 14556 records. Steps were taken to ensure 
that no duplicate records appeared in this file, thus preventing 
inaccurate redundancy measurements. 

The Edmonton telephone syaet ae has approximately 200,000 
listings, of which fewer than 2/3 are residential, giving about 


140,000 residential listings in a city of half a million. We feel 
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a samplesize of 14556 is large enough for testing purposes. 
Redundancy introduced by the DD Code based on these two samp- 
les was studied in the same manner as above as shown in Table 4-9 to 4-12. 
Redundancy for last name only is 2.1% with a sample size of 14556 
and is 0.048% for last name and first three letters of the first 


name. 


D. Optimal Size of Last Name 


To study the optimal size of name for use with the DD Code, 
a diagram was prepared (Diagram 4-3) to show the relation between 
redundancy and the number of letters used. The three curves 
represent three different sample sizes, namely 14556, 10038 and 
6644. 

It is obvious that by increasing the size of last name in DD 
Code we are minimizing the number of 'duplicates'. However, it 
is evident from Diagram 4-3, that once more than seven letters are 
allowed for last name, an additional letter does not reduce the 
number of 'duplicates' significantly. In fact, seven letters gives 
approximately the same discriminating power as ten letters. 
Therefore, the optimal size for last name DD coding may be set at 


* 
Taeoded digits 


*Note: Most name coding study showed 6 letters to be the optimal 
size. Reference 16, Identification Techniques, IBM publi- 


cation. 
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Redundancy vs Number of Letters Used 
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E. Choice of Letters from Last Name and First Name 

(Tables 4-15 to 4-20) 

A study of the selection of letters showed that the best 
strategy is to use both last name and first name. For example, 
data extracted from Tables 4-18, 4-19 and 4-20, using a total of 
11 letters with different combinations from first and last names 


gives the following: 


Number of letters Number of letters 
from Last Name from First Name . Redundancy 
10 1 0. 309% 
9 2 0.103% 
8 3 0.048% 


In fact, using only six letters from last name with three le- 
tters from first name only increases redundancy to 0.055%. 

Therefore, the best letter selection strategy is to use 8 
letters from the last name and 3 letters from the first name 


using DD coding method. 
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bie Sample Size and its Effect on DD Code Redundancy 


As expected, redundancy of DD Code varies in positive propo- 
rtion to sample size. As shown in Table 4-21 (compiled from* 
Tables 4-1 to 4-20, in summarized form), using 10 letters; 
redundancy of DD Code incurred from 0.95% to 2.11% (about double), 
whereas sample size imcressed from!2090 to 14556 (7 times more). 

An attempt to calculate the optimal redundancy rate was spandoned 
due to insufficient data. However, by examining the rate of incr- 
ease of the redundancy measure the maximum redundancy was estimated 
to be approximately 2.5%. We can expect the redundancy rate to be- 
come stable because variation on family name become stable. 
Aceording to 3.2. Rothrock= » any city of medium to large 
population in North America would have similar distribution of 


family names. 


*Note: Diagram 4-4 show same data as Table 4-21, in graphical form. 
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ancy vs Sample Size, Using 10 Letter Code 
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G. Comparison of Different Name Coding Methods 


(against DD Code) 


To compare the relative merits of different coding methods, 


three other methods were programmed with sample sizes of 10038 


and 14556. The redundancy of each is measured and compared with 


the redundancy generated by the DD coding method (using same 


number of letters). Only results obtained from sample size of 


14556 were tabulated. The results as compared with DD code are: 


ie. 


SOUNDEX (Table 4-2 2) 

DD Code is a superior coding method, in addition to all 

the other advantages discussed in Section B of this Chap- 
ter, Advantages of a System using DD Code. 

Davidson's Method (Table -2 3) 

Davidson's Method appears to be superior based on just 
discriminating power, but because of its selection process 
which must start with full last name, it can not be 
implemented in a direct dialed fashion as DD Code. 

Blair's Method (Tables 4-13, 4-14 ana 4-2 4) 

From Table 4-22, it is clear that Blair's method has 
higher discriminating power than DD Code because it simply 
generates codes of lower redundancy. Again, by the same 
argument as above, Blair's Method is not suitable for 
direct dialed use. An interesting point is that Blair's 
method actually generates codes with higher discriminating 
power than the original alphabet when less than 7 letters 


are selected(see Tables 4-13 and 4-1). 
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TABLE 4-2) N= 14556 DD Code vs Blair's Method 


Redundancy Measure (last name, varying size) 


Note: negative value indicates less redundancy with respect to 
original alphabetic names. 
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im jeonelusion, the merits of the DD Code do. not lie in its 
discriminating power alone, but rather in its closeness with 
original names. Above all, DD Code is the only coding method that 
can be implemented directly on a conventional telephone. 

A list is produced in order to show examples of names that 
cause redundancy in DD code (Appendix F). For example, names like 
Gill and Hill are coded the same because letters G and H occupy 
the same key on the standard telephone dial. The use of the DD 
code renders the concept of 'family name' inappropriate; we may 
now consider a 'family class' which include all last names that 
receive same DD code, rather than the conventional 'family name', 
and all these last names may be considered ‘equivalent’. The prob- 
lem then is that a Mr. John Hill would be confused with a Mr. John 
Gill. However, this problem is no worse than having 2 of John 
Hill or 2 of John Gill on’ ‘the file. In both cases, further infor- 
mation (addresses) are needed to identify them. In Sections C, E 
we have shown that redundancy is reduced drastically when 3 letters 
of the first name are used (0.048%). There is no doubt that if 
addresses are furnished, a person can be uniquely identified using 
the DD coding method. 

The list in Appendix F, although long, is included for reference 


and for further analysis of the extent of DD code redundancy. 
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CHAPTER V 


A PROPOSAL FOR AN 


AUTOMATIC TELEPHONE DIRECTORY ASSISTANCE SYSTEM 


AS General System Description 


The objective of this study was to get some feel of the 


design of an automatic telephone directory assistance system based 


on the concept of DD Codes. A batch model with an indexed sequen- 


tial file for master file and a generic search was constructed and 


tested on the CDC 3170/MASTER operating system at the Northern 


Alberta Institute of Technology. The lessons learned from ibjare 


presented below along with a description of the model. 


The system is composed of h major components: 


Ae 


¥Note: 


a hardwired electronic CONVERTER that accepts dialed in 
enquiry through telephone sets (therefore in DD Code aut- 
omatically) ; 

an electronic MESSAGE-SYNTHESIZER* that generates simulated 
human voice output; 

a MASTER file on direct access storage; 

a SEARCH program that receives coded search records (in 

DD Code) from the CONVERTER and performs record matching 


on the MASTER file. 


The basic idea of a Message-Synthesizer is to store pre- 

recorded phrases on a drum type storage device. Controlled 
by a computer, different combinations of phrases can be syn- 
thesized into a sentence and then o tput through a telephoye 
receiver. Details in Trupp's paper and Winkleman's paper’. 
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1. CONVERTER 
The CONVERTER accepts dialed-in message and stores 
them in a buffer until all required fields are specified. 
There are six fields for a generalized system and five 
fields for a basic system. They are: 

a. locality : city or district name (for a general- 

iazed system only); 

b. class : residential or non-residential; 

(the latter includes government agencies 
and commercial listings etc.) 

c. last name - first N letters of the last name. 
Blank filled if less than N. The 
value of N is to be determined for 
the individual implementation. (The 
present study tested values of N 
from 5 to 10, and showed that N is 
greater than or equal to 7) 

d. first name - first M letters of the next name 

most frquently used (values of M 
from 1 to 3 were tested) 

e. house number - 5 digits. e.g. 1320h 

f. ‘street’ number - 3 characters. e.g. 103, JAS 

(alpha or numeric as appropriate) 

Input messages are coded in numeric DD Code according to 

the following scheme: 

1 for Q, Z, and blanks 


2. for A,B, AC 
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Ba fOmT., UV 
Oi EO We aXe ay 
This, of course, is merely the standard code already 
in Whe on telephone dials, with the addition of the 
characters on the 1 button. 
On a Touch-tone telephone set there are twelve buttons, and 
one of the two extra buttons (*,#) can be used as a delimiter if 
so desired. This is to reduce time wasted on keying blanks for 
filling up a field. The zero button is not needed in this application, 
however, it can be used as 'delimiter' for the conventional igeces dd. aie. 
2. SEARCH - The search program accepts encoded information from 
the CONVERTER and a search is done on an on-line 
master file to find a match for the inquiry. 
a. if there is a match, it will output the telephone 
number through the MESSAGE-SYNTHESIZER. 
b. -if there is no match, a message pertaining to this 
is output. 
c. if there is more than one match (i.e. given infor- 
mation cannot uniquely identify a record) it will 


output a message to ask for a more specific in- 


quiry. 
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Output from either awnb, von chisiipassedltont tid the 
MESSAGE-SYNTHESIZER. 

3. fOOAGK-SYNTHESIZER — Output from the search will cause a 
prerecorded human voice to be played and heard by the caller 
on the telephone. 

4, MASTER-FJT.E - The system Master file contains all the tele- 
phone subscriber listings and is stored on a direct access 
storage device. = (disk-or drumjor data celis) ‘It—is subdi- 
vided into two files. 

- an R-file for residential listings. 

-~ a-B-file for all business and governmental listings. 

For a ere general system (used for inter-city telephone 
directory assistance service) a higher level of identifier can be 
added -- locality list, the purpose of which is to identify 


different regions. 


B. FILE STRUCTURE, FILE ORGANTZATION AND DESCRIPTION 
There are three levels of list that make up the master file 
for the generalized system (a basic system will have all but the 
Locality Bist). 
1. Locality List: A list of cities and counties served by 


the system. The list is sequenced in descending order by 


the population size. Each record points to a record in 


the class: 11st. 


Search Method: Simple sequential search 
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2. Class List: With each Locality, several classes are de- 
fined. From the study by Rothrock © » a practical 
number of classes would be two or three*. 

a. Residential 
b. Commercial and governmental (non-residential) 
Search Method: Simple sequential search 


Record Format: 


STARTING ENDING 


ADDRESS ADDRESS 


5. Data list: There is' a data list for each class within 
Locality. There are two types defined, an R-file for re- 
sidential listings and a B-file for business and Govern- 
ment listing (i.e. non-residential). 

a. R-file 
Each record in the R-file shall contain six fields: 
(i) last - for family name 
(ii) first - for next name given 
(iii) house - for house number (e.g., 11532) 
(iv) street - for street number (e.g., 125 Avenue) 
(v) phone - for telephone number 
(vi) pointer - for special record linkage purpose 


(see Chapter VI possible future development ) 


* A possible third class could be defined. In current practice 
by telephone companies, a Frequent Called Number List (FCNL) is 
kept for all frequently referred numbers extracted from the 
other two classes. Each record in the class list points to a 


date list. 
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According to PoLneock = of all requests on residential 
listings, very seldom was other information (e.g., middle 
initial, title) given to help increase the discriminating 
power. In fact, 36% of the requests can furnish only family 
name and next name. 

b. B-file 

In the B-file, each record also contains six Lrelids: 

(i) name - listed company name 

(ii) type - business type 

(iii) house - house number 

(iv) street - street number 

(v) phone - telephone number 

(vi) pointer - future consideration 

In this division, 40% of all the requests can only 
furnish NAME and TYPE, and 29% can supply the street name 
in addition to NAME and TYPE. 

The emphasis of the system should be placed upon the handling 
of the B-file because according to Rothrock's survey of seven lar- 
ger cities, over 72% of all the requests for telephone directory 
assistance are for business listings. This figure confirms the 
finding of Edmonton Telephones. Therefore, it is clear that any 
automatic telephone directory should stress the efficient handling 
of business listings. 

The first four fields of both types are used as a SEARCH key; 
(i) is the major key; (ii) is used only if (i) alone fails to iden- 


tify a unique record. Other keys are used in the same fashion. 
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Records are arranged in ascending order of the combined keys. 
Search Method - Generic Search 
a. A random search using Key (i), last name 


b. sequential search using other keys 


C. File Creation and Maintenance 
Lev-File Creation 

The creation of the master file involves three steps: 

a. Conversion: Data base is originally on cards. All 
information is in alphanumeric code ex- 
actly like a telephone book. The card 
file is read and converted into the all 
numeric DD code. (Rule of converting as 
described before). The converted file 
is then stored on disk. 

b. Sort: The disk file from (a) is read and sorted into 

ascending order. The choice of sorting algo- 
rithm depends on the size of file. (e.g., 
tournament sort). The sorted file is stored 
back on the disk. 

ec. Creation: An indexed sequential file is created from 

the disk file. The result is a LISA file 
again on disk. 
2, File Search (Generic Key Search) 

a. Random Search: A random search is made on the major 

key given (last name). After the first 


record with the major key is found, a 
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sequential key search is performed. 

b. Sequential Search: After the successful retrieval of 
the first record of the given 
major key, requests are made by 
sequential read function until a 
major key change. If information 
on a certain field is not given, 
that particular field is masked. 
(No comparison will be made on 
that field). 

e. A report on the search will be printed out. 

3. File Update 

There are three different functions involved in an update. 

Ald. transactions. from a card,file. 

a. Delete: Cancellation of old listings. A DELETE rou- 

tine will delete record with matching key. 
(Major and minor). 

b. Replace: Mainly changes made on old listings (exe.. 
change of address). A REPLACE routine will 
replace the whole record with new record. 

ec. Insert: Enter new listings. An INSERT routine in- 


serts given record into the file. 


D. System Performance Measurement (CDC Linked Index Sequential File )-LISA 


1; “Update 


Update on master was estimated to be about 10% of the file. 


This becomes a crucial consideration in using this system. 
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With many updates, the file organization may be such that 
access times are greatly degraded by large numbers of 
overflow entries. A suggested solution would be to re- 
organize the file regularly (i.e., build a new edition 

of file). 

Mass Storage use is the percentage of block space occupied 
by user's records compared to the block space available 
for user's records. 

The ve ites in parameters to be considered: 

A: number of accesses 

B; number of buffers 


R: number of data records 


S.: block size in words 


S,: key size in words, fractions to be rounded to the 
next highest word. 

S.: record size inwords, fractions to be rounded to the 
next highest word 

OB: number of cverflow blocks 

IB: number of index records per block 

F:; fill percentage for data block 

NB: number of data blocks 


P: total percentage of record increase during file life 
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where a fractional result must be rounded to the next integer. 
3. Random Record Retrieval and File Maintenance 

The actual number of accesses of mass storage for retrie- 

val or updating of a record is influenced by a number of 

factors: 

a. the file size 

b. the number of I/O buffers 

¢. the block size 

d. the ee size 

e. the number of data block overflows 


To determine the number of accesses, the following para- 


meters have to be calculated: 


ae IB = Sp = 2 Ce. Be 5 6 = ash 


+ 
Sk a] 


(fractional value to be truncated) 


b. number of data records per block (blocking factor) 


BF = S,- 2 Org. woo ae 
3.1 ee 
R 
e, NB eR. x 100 Cs 200,000 <x 100. 38,000 
BF F q Sie 
d. number of secondary index blocks SIB = NB 
IB 
e. number of primary index blocks PIB = SIB 
Ts 


The performance of LISA in retrieving random records 
may be measured by the average number of mass storage 


accesses (A) to retrieve a record. 
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CHAPTER VI 
CONCLUSION 

As stated, the purpose of this report is to outline an automated 
telephone directory assistance system which could function without 
conventional human operators to provide the link between a customer 
and the computer. 

The design is aimed at a generalized system which would serve a 
country or even the whole continent. However, due to a lack of re- 
sources, the experiment carried out to test the feasibility and effi- 
ciency is done only at the basic system level. Thus, there are several 
areas which can be considered for future development. 

A. The design of a cross-country telephone directory assistance 
system network which serves the needs of a country; basically, all 
discussions above can be applied to building local systems which can 
then be eee together to become a "super - system". Communication 
networks experimented extensively at the university level in the 
United States. The Canadian Government is seriously considering im- 
plementing a similar system here in Canada. The possibility of and 
necessity for a telephone directory assistance network is definitely 
there. There are several points to be investigated: 

1. a unified file structure for all local systems for standard- 

iZaQvion 5 

2, amechanism for linking local systems (similar to a long- 

distance telephone call); 

3. a policy on responsibility and Piao lity tor the super = 


system; 
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4. a policy on responsibility and liability for each sub-system; 

5. a charging scheme to the users. First Onad i.) it) Ls obvious 
that the charging for such a service is inevitable. Secondly, 
charges for a local inquiry should be different from an out- 
of-town inquiry; 

6. a profit sharing scheme among the sub-systems. When an out- 
of-town inquiry is made, Bln Aa pee ee ese es of 
the revenue must be determined; 

7. the administrationof the super-system. 

As we can see, a network of this nature has great potential for 

other applications to serve the community. For instance, credit card 
agencies will find such a system very useful to help trace down 
debtors . Who move to another part of the country. 

In our society, the telephone has become an integral part of our 
home. A telephone number could certainly be considered as a unique 
sdentification number. Perhaps, using a unique telephone number would 
be a step toward a unified person identification number system. 

B. In the basic system, four fields (last name, first name, 
house number, and street number) must match ("don't know' is 
considered as a match) whatever is on a record before the 
telephone number on the record will be given. In order to 
maximize the chance of a match ($fiiteis Cyonsidered desirable) 
to warrant at least one telephone number to be given, & 
weighted scheme can be employed. That is, after each comp- 
arison, the degree of "equalness' is assessed and a weight 
shall be assigned. After all necessary comparisons are done 


the record with the highest weight shall be 
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considered a match. Two problems arise naturally: 


1. how to determine the weight to be assigned. Blair's letter 

weight and position weight can be consulted. 

2. how many comparisons should be made. Would it be reasonable 

to compare all records that match last name field perfectly? 

The advantage of a weighted scheme is that "minor errors" in 
spelling sometimes can be "overlooked" in correct retrieval of a tele- 
phone number. The obvious disadvantage is the higher probability of 
erroneous retrieval of a telephone number. 

C. In the basic system, a sequential search on file is performed 
after the first record with the correct last name (accessed by a random 
search). It is done in this fashion because of the low frequency of 
occurrence of names with the same last name. A modified version would 
be a complete tree structure as described in the generalized system. 
Records with identical sub-fields (first name, house number, street 
number) would be linked as a list structure. 

D. A simultaneous questioning-searching scheme. Search shall be 
carried out simultaneously with the collection of data from the caller. 
As each field is completed, its search will be initiated. If a unique 
answer is recognized before all fields of information are supplied, 
the system shall terminate the questioning and gives the answers to 
the caller. There are two disadvantages: 

1. during the time a customer is keying the information, it ties 

up the terminal; 

2, erroneous specification by the caller results in wasteful 

search time,whereas the system described in this report 


would only start searching when @ "send" signal is given by 
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the caller after he correctly keyed-in all information. 


E. Creation of an inverted index on telephone numbers for the 
master file. This would help in the aspects of accounting 
and charging. Also, it provides record linkage capability for 


other applications such as law enforcement. - 


F. Creation of a list of newly assigned telephone numbers corr- 
esponding to the old telephone number of the same customer. 
This would enable a caller to find out a new telephone number 
with insufficient knowledge of the regular search information 


(first name, etc.). In the future, he can dial the correct number. 


In conclusion, all the above suggested future developments are 
geared to a more versatile and powerful telephone directory assistance 
system. Whether they are applicable and feasible for implementation 
depends largely on the needs of each individual local telephone 
company. We may consider the following: 

1. what level of service the local telephone company wants 
to achieve; 

2, how much the local telephone company is willing to spend 
in order to attain that level of service; 

3. is the time constraint an important factor (e.g. Edmonton 
Telephones find they must have a new system to replace 
the old manual system by 1974); 

h. the area of standardization of file and data structure 
must be examined carefully to ensure future compatibility 
with other systems to make a network. 


As it is generally conceded that no system is perfect, an open- 
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ended system would definitely be more dynamic and adaptable to new 
requirements and changing demands. In this era of rapid technological 
advancements, we cannot afford to let the field of information proce- 
ssing stay behind. Information is wanted and wanted fast. It is 
certainly unwise to try to design a perfect systemat the expense of 
time. Instead, a simple functional system that is eusy to upgrade and 
flexible enough to utilize future computing hardware and software 
facilities would prove itself to be most desirable and economical. 
Looking ahead into the future, with the advent of this direct 
dialed system, potentially every household with a telephone set has a 
computer linked terminal. Prophecies of housewives shopping without 
leaving home, or of a man doing business with a computer via his 
telephone set, can be fulfilled in the near future. What is more, 
conversion to this 'terminal age! is easy and painless as the telepho- 
ne is readily available in most households and offices nowadays. 
Direct Dialing is just an step toward this goal, but the implications 


are very far-reaching and significant. 
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APPWNDIX 


User Instructions 

Dialing Rules 

Rules of SOUNDEX 

Factors Influencing Choice of Identifying Information 
Swedish Government UID Scheme 


List of Different Names that Generate the same DD Codes 
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USER INSTRUCTIONS 


For a long distance inquiry, a user should dial 1 first and then 

dial the area code. (e.g., 604 for B. C.). 

Dial for automatic directory assistance (Assume still }11). 

Wait for a ready tone. 

Enter a sequence of four items of information (Called fields) 

Each field separated by pushing a delimiter button. 

a. last name (or finding name for a business listing); use only 
The first [({ letters. 

b. first name (or business type if it is a business lasting) 
use only the first three letters. 

c. house number; maximum: five letters or digits. 

d. street number (or street name); maximum: 3 letters or digits 

If any item of Setagcuarsesd is less than the allowed maximum 

length, the user just dials blanks to fill wo the field, or uses 

the field terminator. 

When all four items are entered, the user shall hear the answer 

from the telephone. 

The answer may be in one of three forms: 

a. one telephone number if there is one and only one NPind’ 

b. an apology if there is no Pind. 

c. if there is more than one "find", several numbers will be given 
and the user may try them. (The maximum number of telephone nos. 


to give on one request can be determined by each individual 


installation). 


The line is disconnected. 
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Note: 


The efficient use of the system depends on the accuracy 
and completeness of the search information provided by 
the users. Therefore, it is the user's responsibility to 
provide accurate and complete information to ensure 
successful information retrieval (e.g. correct spelling 
of names). 

Often a user may be uncertain about a field. He should 
make separate inquiries with each change on the search 
information. Identical search information would only 
result in an identical answer. 

There shall be an emergency number for “desperate cases". 
That is, a human operator shall come to the user's assis- 


tance if he dials this "emergency" number. 
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DIALING RULES 


on 


1. All the letters (except Q and Z) have an equivlent number code as 


shown on a standard telephone dial disk (also the same as a touch- 


tone telephone panel). 


equivalent number on the dial. 


is 3) 


ee Del | fer ©, 2, or blanks. 


Therefore to dial a letter, just dial the 


(e.g... the number for D, EB, or FP, 


3. Numerals are unchanged. (e.g. 3 is still 3) 


EXAMPLES : 
Be det.) 
(ii) 
(tad } 


(iv) 


last name (7)* BLACK 
next name (3)* JOHN 
house number (5)* 12345 
street number (3) 104 St. 
last (7) ANDERSON 

next name (3) WILLIAM 


house number (5) 8204 


street number (3) Jasper Ave. 


2522511 
564 
12345 
104 
2633776 
ols 
82041 


oa, 


* Note: A delimiter can be implemented to separate fields, (special key 


on the new touch-tone keyboard). 


User may dial as many characters as 


he wishes and closes each field with the delimiter (e.g. *). Fields 


that are too long shall be truncated while fields that are short shall 


be blank filled. 
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RULES OF SOUNDEX 


The first letter of a surname is retained in its uncoded 
form and is termed a prefix. 


Other letters of the surname are assigned code numbers as 


follows: 

B3 PEM vi =1 
Cy GF IPK , HQUS3°K 92 = 2 
Ds T = 3 
L =k 
M, N a 


R = 6 

A, E, I, 0, U and Y are not assigned a code but serve as 
‘separators’ (see below); W a H are ignored entirely. 

The second of a pair of consecutive identical digits may be 
retained as part of the code only if the corresponding con- 
sonants are separated by a vowel or Y. The rule applies in 

a similar fashion to a digit which would follow a prefix letter 
having the same code. 

The coding stops when three digits have been obtained. If 

the coding yields less than three digits, zeros are used to 


complete the code. 
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FACTORS INFLUENCING CHOICE OF 


IDENTIFYING INFORMATION 


MRC (Medical Research Council) Report #3 (1968) 


Appendix II: Record Linkage in Canada 


Possible addition: woman's maiden name, if husband's name were used. 


Errors 4a. 


b. 


Social Insurance Number (SIN) 

Full Name 

Mother's maiden name 

Day, month and year of birth 

Place of birth 

(city, province if in Canada, or city and country if not 
in Canada) 


Sex 


failure to link records that ought to be linked. 


Linkage of records that ought not to be linked. 


A measure of ‘tredundancy' - make multiple comparison of records. 


Acceptable error level depends on particular project. 


Feasible linkage requires: 


1. 


Ce 


name: first, middle initial, last. 
mother's maiden name. 

date of birth 

place of birth 


se Xx 
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6. marital status 
Additional information; 
1. first name of spouse 
2. maiden name if married woman 
3. place of residence (city, province) 
4. father's first name 
Le mother's first name. 
Date of registration of the record, used: 
1. entry on record 


2. parameter in the record. 


Best solution is a universal identification number. 
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SWEDISH GOVERNMENT UID SCHEME E-1 
Personal Identification Number 
The personal identification number has ten digits with a dash after the 
first six. It consists of the following three parts: 


a) birth date (6 digits) 
| b) birth number (3 ee! 


| |e) control figure 


i3 80425 ot. [665 73] 
a) The birth date is indicated by six digits in the following order: 


two last digits of year of birth 
month 
| day 
IseToules 
bd) The birth number has three digits, odd for men and even for women. 
It can be any of the numbers 001 - 999. Persons born on the same 
day shall have different numbers. 
ec) The control figure is added to the birth number and can as a rule 
be used to test that no wrong figures have been given in birth date 
and birth number. It is calculated in the following way: 
1. All the single digits in birth year, month and day, and birth num- 


ber are multiplied alternately by 2 and 1 


3:6. Ons 
2a Act 


rN 


Ge655 
Zar 


640 0n ys) 55” L256, 20. 

2, Add the received figures. Note that 1l2=1+2 
64+8+0+4+4+5+1+2+641+40=37 | 

3. The last digit of the sum is decuted from the number 10. 
10-7=3 


4, The figure arrived at is the control figure. If this figure is 10, 
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the control figure will be 0. 


Birth numbers were introduced in Sweden on January he 190 ihe 


control figure was added on January 1, 1968. 


Every person registered for census in Sweden shall have a personal 
identification number, regardless of citizenship. Such numbers with 
the exception of the control figure, have been given to all persons 
er Sea in Sweden on January 1, 1947, and to all who thereafter have 


been registered. 


In some instances personal identification numbers are given to 
persons not census registered in Sweden, for example, persons doing 
military service in Sweden or who pay taxes in Sweden are registered 


with the Swedish Social Insurance Service. 
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Partial 


LIST OF DIFFERENT NAMES* 


= 


THAT GENERATES SAME DD CODES 


COMMON NAMES** WERE INDICATED 


*Based on 14556 students records 


16 


**Common names table 


1. Base on analysis of U.S Social Security” 
records 117,358,888 


2. Includes 1586 names representing 48% of persons 
on record 


3. Rank number indicates ranking of 121 most common 
names 
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LOL 
OF DIFFERENT NAMES NAMES HAVING SAME CCDF Vie Ea 


LAST NAME _ 
ZADUNAYSKTI 
7ZADVNAYSKT 


CARAY 
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